ROCm 與 HIP：詳盡的十章教學：GPU 同步性的思維轉變

高性能運算中的根本轉變，是從以中央處理器（CPU）為中心的串行執行模型，轉向一種解耦的生產者-消費者模型。在這種模式下，中央處理器負責管理資料流，而圖形處理器（GPU）則獨立運作。核心認知是 圖形處理器（GPU）並非設計成嚴格同步的設備；若將其當作同步設備來操作，就會產生「停等」式的瓶頸。

在非同步的思維模式下，開發者不會等待每一項任務完成。相反地，他們會配置記憶體，啟動核函數，並 將結果複製回 透過將非阻塞請求放入硬體佇列中，完成操作。

當主機被迫在每次操作後同步進行同步時，執行間隙——即中央處理器與圖形處理器之間的傳輸時間——便成為性能的主要影響因素。透過運用 非同步，中央處理器可持續工作，同時圖形處理器則處理其資料流，從而最大化硬體的飽和度。

$$\text{總時間} = \max(\text{CPU 工作量}, \text{GPU 工作量}) + \text{同步開銷}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which set of steps correctly converts a synchronous vector-add to use an explicit stream?

Call hipStreamCreate, use hipMemcpyAsync with the handle, and pass the handle as the 4th kernel argument.

Call hipDeviceSynchronize after every kernel launch and use hipMemcpy.

Set the stream parameter to NULL in all hipMemcpyAsync calls.

Replace hipMalloc with hipHostMalloc exclusively.

QUESTION 2

Why is a GPU considered 'not meant to be driven as a strictly synchronous device'?

Because it has no internal clock.

Because waiting for the CPU to confirm every command leaves thousands of cores idle.

Because memory transfers cannot be tracked by the CPU.

Because the GPU must manage its own power state.

QUESTION 3

What is the primary risk of forcing the host to synchronize after every operation?

Memory corruption.

Host-side stalling and loss of hardware saturation.

Increased power consumption on the GPU.

Kernel compile errors.

QUESTION 4

In the logistics warehouse analogy, what does the 'Conveyor Belt' represent?

A HIP Stream.

The GPU Driver.

The CPU Cache.

The VRAM buffer.

QUESTION 5

True or False: hipMemcpyAsync returns control to the CPU before the data transfer is complete.

True

False